This project involved the analysis of breast cancer data derived from the TCGA-BRCA project using R. Multi-omics data (RNAseq STAR counts data, microRNA data proteomics and clinical data) of (495 primary solid) tumor samples was downloaded and processed using TCGABiolinks R package. The RNAseq data were divided into two matrices, one for the long noncoding RNAs and one for the mRNA.

To prepare the dataests for MOFA algorithm, we started with features quality control, normalization, and scaling. MOFA2 (Multi- Omics Factor Analysis) R package was used for multi-omics data integration, which uses an unsupervised approach to discover the patterns that drive variation across different molecular layers. Features included in MOFA analysis were (2281 for mRNA, 1572 for miRNA, 378 for Proteome and lncRNA for 2228).

To identify differentially enriched pathways, biological processes and functions, gene set enrichment analysis was performed for mRNA and Proteome using REACTOME and C5(GO: BP) from MSIGDB. On the other hand, overrepresentation analysis (ORA) was conducted on the miRNA and lncRNA. First, differential expression analysis was performed using DESeq2 Package, and the samples were grouped based on their position in factor one of MOFA, positive and negative groups. Then, ORA was performed using TAM-2 tools for miRNA and PANTHER web tool for lncRNA. The ORA analysis used REACTOME, GO: BP, and GO: MF.

#Set Working Directory

#Libraries

library(readxl)
library(readr)
library(vsn)
library(DESeq2)
library(tidyverse)
library(ComplexHeatmap)
library(ggfortify)
library(umap)
library(ggplot2)
library(ggrepel)
library(TCGAbiolinks)
library(ComplexUpset)
library(ggupset)
library(openxlsx)

##Loading MetaData & Samples

In this section we loaded the Samples from TCGA database Link: (https://portal.gdc.cancer.gov/) using the TCGAbiolinks package Link: (https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html)

#MetaData

RNA_TCGA <- read.delim("Sample_Sheets/RNA_SampleSheet.tsv" )
miRNA_TCGA <- read.delim("Sample_Sheets/miRNA_SampleSheet.tsv" )
Methylation_TCGA <- read.delim("Sample_Sheets/Methylation_SampleSheet.tsv" )
Protein_TCGA <- read.delim("Sample_Sheets/Protein_SampleSheet.tsv" )

We Found 1207 , 919 and 1231 TCGA-BRCA Samples in miRNA , Protein and RNA databases respectively. To figure out the common samples w did the following steps..

Intersect_RNA_miRNA <- intersect(RNA_TCGA$Case.ID , miRNA_TCGA$Case.ID )
Intersect_Meth_Prot <- intersect(Methylation_TCGA$Case.ID , Protein_TCGA$Case.ID)
barcodes <- intersect(Intersect_RNA_miRNA , Intersect_Meth_Prot)

629 Samples have all the four omics. By Sample ID we will subset the Omics metadata

df <- data.frame(Case.ID = barcodes)
dim(df)
## [1] 629   1
#Subsetting the Common Samples from the BulkRNA Samples
RNA_Samples <- RNA_TCGA[which(RNA_TCGA$Case.ID %in% df$Case.ID),]
table(RNA_Samples$Sample.Type)
## 
##          Metastatic       Primary Tumor Solid Tissue Normal 
##                   5                 640                  86
barplot(table(RNA_Samples$Sample.Type) , space = 0 , horiz = F, main = "RNA Sample Types")

640 Primary Tumor Samples are our target, and this indicates that there are some patients have more than one sample and others have metastatic and adjasent normal extracted samples.

#Subsetting the Common Samples from the miRNA Samples
miRNA_Samples <- miRNA_TCGA[which(miRNA_TCGA$Case.ID %in% df$Case.ID),]
table(miRNA_Samples$Sample.Type)
## 
##          Metastatic       Primary Tumor Solid Tissue Normal 
##                   5                 639                  78
barplot(table(miRNA_Samples$Sample.Type) , space = 0 , horiz = F , main = "mi-RNA Sample Types")

639 Primary Tumor Samples are our target, and this indicates that there are some patients have more than one sample and others have metastatic and adjasent normal extracted samples.

#Subsetting the Common Samples from the Protein Samples
Protein_Samples <- Protein_TCGA[which(Protein_TCGA$Case.ID %in% df$Case.ID),]
table(Protein_Samples$Sample.Type)
## 
##          Metastatic       Primary Tumor Solid Tissue Normal 
##                   4                 629                  28
barplot(table(Protein_Samples$Sample.Type) , space = 0 , horiz = F , main = "Protein Sample Types")

629 Primary Tumor Samples are our target, and this indicates that there are some patients have more than one sample and others have metastatic and adjasent normal extracted samples.

#Load Metadata

load('metaData')

#Upset Plot

#Upset plot
upset_list <- list()
for(sample in unique(Clinical_data_ann$bcr_patient_barcode)){
  exist <- c()
  if (sample %in% unique(RNA_TCGA$Case.ID)){
    exist <- c(exist, 'BulkRNA')
  }
  if(sample %in% unique(Protein_TCGA$Case.ID)){
    exist <- c(exist, 'Protein')
  }
  if(sample %in% unique(Methylation_TCGA$Case.ID)){
    exist <- c(exist, 'Methylation')
  }
  if(sample %in% unique(miRNA_TCGA$Case.ID)){
    exist <- c(exist, 'miRNA')
  }else{
    exist <- c(exist, NA)
  }
  upset_list[[sample]] <- exist
}
upset_tibble <- data.frame(sample = unique(Clinical_data_ann$bcr_patient_barcode))
upset_tibble <- as_tibble(upset_tibble)
upset_tibble$list <- upset_list
upset_tibble %>% ggplot(aes(x=list)) +
  geom_bar() +
  xlab('Omics Type')+
  scale_x_upset(n_intersections = 10)+
  scale_x_mergelist(sep = "-") +
  axis_combmatrix(sep = "-") +
  scale_x_upset(order_by = "freq")
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Only 629 Samples has four Omics, However the samples which will be included in the analysis will be further sub-setted because the selected metdata lacks some info about the breast cancer subtypes of some samples.

rm(upset_list , upset_tibble , df , Methylation_TCGA , miRNA_TCGA , RNA_TCGA , Protein_TCGA)
rm(exist , Intersect_Meth_Prot , Intersect_RNA_miRNA , sample)

Save those Common Samples as txt format

#Save the Samples as txt file 
write.table(barcodes, 'Common_Samples.txt' , quote = F , row.names = F , col.names = F)
metadata <- Clinical_data_ann

rename rows of metadata by sample IDs for further sub-setting

row.names(metadata) <- metadata$bcr_patient_barcode

We sub-setted the metadata to only 628 samples as one of the 629 is not avilable in this metadata

indexes=match(barcodes,rownames(metadata))
indexes= indexes[!is.na(indexes)]
metadata <- metadata[indexes , ]

Explore Breast Cancer sub-types

barplot(table(metadata$IMS), main = "Breast Cancer Subtypes in selected Samples")

This Barplot shows that selected samples include: Four breast cancer sub-types (Basal: 129 samples , Her2: 91 samples , LumA: 197 samples , LumB: 130 samples)

Check if all samples are included in each Omic metadata

sum(metadata$bcr_patient_barcode %in% RNA_Samples$Case.ID)
## [1] 628
sum(metadata$bcr_patient_barcode %in% miRNA_Samples$Case.ID)
## [1] 628
sum(metadata$bcr_patient_barcode %in% Protein_Samples$Case.ID)
## [1] 628

#BulkRNA

We used TCGAbiolinks with following arguments to download the selected 628 Bulk-RNA data from GDC database

#Create the selected samples query
BulkRNA_query <- GDCquery(project = 'TCGA-BRCA',
                          data.category = 'Transcriptome Profiling',
                          data.type = 'Gene Expression Quantification',
                          sample.type = 'Primary Tumor',
                          barcode = barcodes)
#Download the samples files
GDCdownload(BulkRNA_query)

Read the downloaded files to form the Gene expression matrix

##### load the mRNA-Seq data #####
data.path="GDCdata/TCGA-BRCA/harmonized/Transcriptome_Profiling/Gene_Expression_Quantification"
files <- list.files(path=data.path,recursive=T, pattern = "tsv")
file=files[1]
filepath=file.path(data.path,files[1])
colnames= unlist( strsplit ( readLines(filepath ,n=2) [2] ,"\t"))
temp <- read.table(filepath, header=F,skip = 6)
names(temp)=colnames
head(temp)
exp=temp[,c(1,2,3)]

for(i in 1: length(files))
{
  file=files[i]
  file.id=strsplit(file,"/")[[1]][1]
  filepath=file.path(data.path,files[i])
  temp <- read.table(filepath, header=F,skip = 6)
  colnames(temp)[4]=file.id
  exp=cbind(exp,temp[4])
}
head(exp)

Select only protein coding transcripts

exp_pc <- exp%>%filter(gene_type == "protein_coding")

Check for duplicated transcripts that corresponds to the same gene

# check duplciation of of gene symbols?  
x=duplicated(exp_pc$gene_name)  
sum(x)
exp.data=exp_pc[,4:dim(exp_pc)[2]]
exp.data=apply(exp.data,2, as.numeric)
head(exp.data)

Remove Duplication by aggregation of the duplicated transcripts by taking there mean value expression into one gene

#### remove  duplication by aggregation
exp.data.agg= aggregate(exp.data, list(exp_pc$gene_name),FUN=mean)
rownames(exp.data.agg)=exp.data.agg$Group.1
exp.data.agg=exp.data.agg[-1]
exp.data.agg=round(exp.data.agg)
file.ids=colnames(exp.data.agg)
View(exp.data.agg)

We renamed the samples with there TCGA sample ID instead of the file IDs

###### load the mrna sample sheets  # sample sheets
pheno <- RNA_Samples
View(pheno)
colnames(pheno) = sub(" " , "." , colnames(pheno))
file.ids.pheno=pheno$File.ID
index.files=match(file.ids,file.ids.pheno)

pheno <- pheno[index.files, ]
table(pheno$`Sample.Type`)
file.ids.pheno=pheno$`File.ID`
index.files=match(file.ids,file.ids.pheno)
names(exp.data.agg)=pheno$Case.ID[index.files]
BulkRNA.data <- exp.data.agg[,metadata$bcr_patient_barcode]

We saved the read-count expression matrix as RDATA file

save(BulkRNA.data , file = "BulkRNA.data.RDATA")

Load

load("BulkRNA.data.RDATA")

we filtered Genes with zero variance across all samples to reduce dimensions of the Genes that have no variance explaining the difference between breast cancer sub types

genesVariance <- as.data.frame( apply( BulkRNA.data, 
                                       MARGIN = 1 , 
                                       var , 
                                       simplify = T))
names(genesVariance) = 'Variance'
vFilterGenes <- genesVariance %>% filter(Variance > 0 )
vFilterGenes <- row.names(vFilterGenes)
BulkRNA.data <- BulkRNA.data[vFilterGenes, ]

remove low quality genes (i.e. genes that have total expression lower than 10 in all 628 samples)

keep <- rowSums(BulkRNA.data) >= 10
BulkRNA.data <- BulkRNA.data[keep,]

We estimated size factors and variance stabilization transformation

dds = DESeqDataSetFromMatrix( countData = BulkRNA.data, 
                              colData = metadata , 
                              design = ~gender)

dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)
View(ntd)
hist(genesVariance$Variance)

Filter genes with highest variance according to specific variance cut-off extracted from the shown histogram

Variance_Cutoff <- 15e+6
vFilterGenes <- genesVariance %>% arrange(desc(Variance)) %>% filter(Variance >= Variance_Cutoff)
vFilterGenes <- row.names(vFilterGenes)
ntd <- ntd[vFilterGenes,]

Log2 Normalization was done to normalize the data

BulkRNA.data.log= log2(ntd + 1)

we Scaled the matrix by Genes to be ready for linear combination model by MOFA

BulkRNA.data.log.scale <- scale(t(BulkRNA.data.log) , center = TRUE, scale = TRUE)
BulkRNA.data.log.scale <- as.data.frame(t(BulkRNA.data.log.scale))

We saved the normalized and scaled gene expression matrix to RDATA file

save(BulkRNA.data.log , BulkRNA.data.log.scale , metadata , file = "BulkRNA.data_.RDATA")

Load

load("BulkRNA.data_.RDATA")

Draw Plots of the data before and after normalization and scaling

options(repr.plot.width=20,repr.plot.height=8)

par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(BulkRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(BulkRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(BulkRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')

We saved the read-count gene expression matrix to be used for selecting long non-coding RNA

save(exp , RNA_Samples , file = "ln_material.RDATA")
load("ln_material.RDATA")

#miRNA We used TCGAbiolinks with following arguments to download the selected 628 mi-RNA data from GDC database

miRNA_query <- TCGAbiolinks::GDCquery(project = 'TCGA-BRCA',
                                      data.category = 'Transcriptome Profiling',
                                      data.type = 'miRNA Expression Quantification',
                                      sample.type = 'Primary Tumor',
                                      barcode = barcodes)

GDCdownload(miRNA_query)

Read the downloaded files to form the miRNA expression matrix

data.path="GDCdata/TCGA-BRCA/harmonized/Transcriptome_Profiling/miRNA_Expression_Quantification"
files <- list.files(path=data.path,recursive=T, pattern = "txt")
file=files[1]
filepath=file.path(data.path,files[1])
colnames= unlist( strsplit ( readLines(filepath ,n=2) [1] ,"\t"))
temp <- read.table(filepath, header=T)
exp_mi=temp[,c(1)]
for(i in 1: length(files))
{
  file=files[i]
  file.id=strsplit(file,"/")[[1]][1]
  filepath=file.path(data.path,files[i])
  temp <- read.table(filepath, header=T)
  colnames(temp)[2]=file.id
  exp_mi=cbind(exp_mi,temp[2])
}
row.names(exp_mi) = exp_mi$exp_mi
exp_mi = exp_mi[-1]
file.ids=colnames(exp_mi)

#miRNA.data = exp_mi
file.ids.pheno=miRNA_Samples$File.ID
index.files=match(file.ids,file.ids.pheno)
names(exp_mi)=miRNA_Samples$Case.ID[index.files]
miRNA.data=exp_mi
indexes=match(metadata$bcr_patient_barcode,colnames(miRNA.data))
indexes= indexes[!is.na(indexes)]
miRNA.data <- miRNA.data[,indexes]

We saved the read count miRNA expression matrix as RDATA file

save(miRNA.data , file = "miRNA.data.RDATA")

Load

load("miRNA.data.RDATA")

We removed miRNA features with zero variance

miRNAVariance <- as.data.frame( apply( miRNA.data, 
                                       MARGIN = 1 , 
                                       var , 
                                       simplify = T))
names(miRNAVariance) = 'Variance'
vFiltermiRNA <- miRNAVariance %>% filter(Variance > 0 )
vFiltermiRNA <- row.names(vFiltermiRNA)

miRNA.data <- miRNA.data[vFiltermiRNA, ]

We calculated the estimated size factors and variane stabilizing transformation for removing of the biological and techniqal sequencing errors

dds = DESeqDataSetFromMatrix( countData = miRNA.data, 
                              colData = metadata[colnames(miRNA.data),], 
                              design = ~gender)

dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)

we log normalized the miRNA expression matrix

miRNA.data.log= log2(ntd + 1)

We scaled the miRNA normalized expression matrix by feature to be introduced to MOFA

miRNA.data.log.scale <- scale(t(miRNA.data.log) , center = TRUE, scale = TRUE)
miRNA.data.log.scale <- as.data.frame(t(miRNA.data.log.scale))

We saved the normalized and scaled matrices as RDATA file

save(miRNA.data.log , miRNA.data.log.scale , metadata , file = "miRNA.data_.RDATA")

Load

load("miRNA.data_.RDATA")
options(repr.plot.width=20,repr.plot.height=8)

par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(miRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(miRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(miRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')

#lnRNA We filtered the read count gene expression matrix for only lnc RNA

exp_lnc <- exp%>%filter(gene_type == "lncRNA")

We checked for duplicated transcripts

# check duplciation   
x=duplicated(exp_lnc$gene_name)  
sum(x)

We aggregated the duplicated transcripts by taking there mean values

exp.data=exp_lnc[,4:dim(exp_lnc)[2]]
exp.data=apply(exp.data,2, as.numeric)
View(exp.data)
#### remove  duplication by aggregation
exp.data.agg= aggregate(exp.data, list(exp_lnc$gene_name),FUN=mean)
rownames(exp.data.agg)=exp.data.agg$Group.1
exp.data.agg=exp.data.agg[-1]
exp.data.agg=round(exp.data.agg)
file.ids=colnames(exp.data.agg)
View(exp.data.agg)

We renamed the lncRNA expression matrix by sample IDs instead of File IDs

###### load the mrna sample sheets  # sample sheets
pheno <- RNA_Samples
View(pheno)
colnames(pheno) = sub(" " , "." , colnames(pheno))
file.ids.pheno=pheno$File.ID
index.files=match(file.ids,file.ids.pheno)

pheno <- pheno[index.files, ]
table(pheno$`Sample.Type`)
file.ids.pheno=pheno$`File.ID`
index.files=match(file.ids,file.ids.pheno)
names(exp.data.agg)=pheno$Case.ID[index.files]
lncRNA.data <- exp.data.agg[,metadata$bcr_patient_barcode]

Load

load("lncRNA.data_.RDATA")

We removed the lncRNA features with zero variance

lncRNAVariance <- as.data.frame( apply( lncRNA.data, 
                                       MARGIN = 1 , 
                                       var , 
                                       simplify = T))
names(lncRNAVariance) = 'Variance'
vFilterlncRNA <- lncRNAVariance %>% filter(Variance > 0 )
vFilterlncRNA <- row.names(vFilterlncRNA)
lncRNA.data <- lncRNA.data[vFilterlncRNA, ]

We estimated the size factors and calculated the variance stabilizing transformation

dds = DESeqDataSetFromMatrix( countData = lncRNA.data, 
                              colData = metadata , 
                              design = ~race)

dds <- estimateSizeFactors(dds)
dds <- varianceStabilizingTransformation(dds)
ntd <- assay(dds)

We selected the lncRNA features with highest variance according to specific cut-off to be with suitable dimension while running MOFA algorism

Variance_Cutoff <- 2000
vFilterlncRNA <- lncRNAVariance %>% arrange(desc(Variance)) %>% filter(Variance >= Variance_Cutoff)
vFilterlncRNA <- row.names(vFilterlncRNA)
ntd <- ntd[vFilterlncRNA,]

We normalized the lnRNA expression matrix after filtration

lncRNA.data.log= log2(ntd + 1)

We scaled the matrix by features

lncRNA.data.log.scale <- scale(t(lncRNA.data.log) , center = TRUE, scale = TRUE)
lncRNA.data.log.scale <- as.data.frame(t(lncRNA.data.log.scale))

We saved the lnRNA normalized and scaled data to RDATA file

save(lncRNA.data , lncRNA.data.log , lncRNA.data.log.scale , file = "lncRNA.data_.RDATA")
options(repr.plot.width=20,repr.plot.height=8)

par(mar = c(8,5,2,2),mfrow=c(1,3),cex.axis=1.5)
plot(density(apply(lncRNA.data, 2, mean, na.rm = TRUE)),main='befor log2')
plot(density(apply(lncRNA.data.log, 2, mean, na.rm = TRUE)),main='after log2')
plot(density(apply(lncRNA.data.log.scale, 2, mean, na.rm = TRUE)),main='after log2 + scale')

#Proteins We created GDCquery for protein samples by using TCGAbiolinks R package

query <- GDCquery(project = "TCGA-BRCA",
                      data.category = "Proteome Profiling",
                      access = "open",
                      legacy = FALSE,
                       barcode = barcodes,
                      platform = "RPPA",
                      data.type = "Protein Expression Quantification" ,
                      sample.type =  "Primary Tumor")
                      
View(getResults(query))

we run GDC download and prepare to form the protein expression matrix

GDCdownload(query)

We saved the normalized and scaled protein matrix as RDATA file

Protein.data.log.scale <- GDCprepare (query, save.filename =  "RPPA_TCGA-BRCA", save = TRUE)
save(Protein.data.log.scale, file = "Protein.data.log.scale_.RDATA")

As each feature in the protein matrix is protein ID we annotated each feature into Gene ID for further enrichment analysis by using AGID annotation data base to select the corresponding gene name

# read the loaded annotation for AGID
AGID <- read_delim("AGID.tsv", delim = "\t", escape_double = FALSE, 
                               trim_ws = TRUE)
AGID_ <- AGID %>% select(AGID , gene_name)
#1. set AGID name to one gene 
AGID_$gene_name<- sub ("/.*", "", AGID_$gene_name)
#2. set rowname for RPPA
Protein.data.log.scale<- merge(Protein.data.log.scale , AGID_ , by= "AGID")

We removed the duplicated gene names by taking their mean values

Protein.data.log.scale=Protein.data.log.scale[,6:dim(Protein.data.log.scale)[2]]
Protein.data.log.scale_num=apply(Protein.data.log.scale,2, as.numeric)
View(Protein.data.log.scale_num)
#### remove  duplication by aggregation
Protein.data.log.scale_agg= aggregate(Protein.data.log.scale_num, list(Protein.data.log.scale$gene_name),FUN=mean)
Protein.data.log.scale_agg <- Protein.data.log.scale_agg[-dim(Protein.data.log.scale_agg)[2]]
rownames(Protein.data.log.scale_agg)=Protein.data.log.scale_agg$Group.1
Protein.data.log.scale_agg=Protein.data.log.scale_agg[-1]
View(Protein.data.log.scale_agg)

We Ckecked the protein feature variances

num.mat <- as.matrix( Protein.data.log.scale_agg)
vars<-  rowVars(num.mat, na.rm = TRUE) 

We Detectd variances equal to zero or NA, and remove them from the main matrix

zero_var<- which( vars== 0 | is.na(vars))
length (zero_var)
 sum (vars == 0) 

Now we have 22 feature with zero variance

num.mat<- num.mat[ - zero_var, ]

Missings imputation We Computed all missinigness rate in proteins

#all_elmnts
nrow(num.mat) * ncol(num.mat)
# NAs_elmnts
sum(is.na(num.mat))
# # all missingness percentage
sum(is.na(num.mat))/(nrow(num.mat) * ncol(num.mat))

round(sum(is.na(num.mat))/(nrow(num.mat) * ncol(num.mat)) *100,1)

so data with 7.6 missing data

we Computed missinigness rate in each feature

missd_features<- rowSums( is.na(num.mat))/ nrow(num.mat)*100
sum (missd_features> 50)
quantile(missd_features)

We visulalized missings rates

## draw histogram for missingness rate for each feature
h=hist( missd_features,
        breaks=10,
        main="",
        xlab="percentage of missingness")
# add counts to each break
text(h$mids,h$counts,labels=h$counts, adj=c(0.5, -0.5))
# the cutoff will be 50%
abline(v = 50)
good.inx=apply(is.na(num.mat), 1, sum)/ncol(num.mat) <0.2
sum(good.inx)
num.mat.good<- num.mat[good.inx, ]
library(impute)

Here we have no samples with more than 50% missing rate we imputed missingness by K nearest neighbour algorism by using impute.knn function fro impute R package

num.mat.imputed<- impute.knn(num.mat.good)$data

Confirm that we dot have missed data

sum (rowSums(is.na(num.mat.imputed)))
Protein.data.log.scale <- num.mat.imputed
Protein.data.log.scale <- scale(t(Protein.data.log.scale) , center = TRUE, scale = TRUE)
Protein.data.log.scale <- as.data.frame(t(Protein.data.log.scale))

We renamed the samples with sample IDs

col_names <- c() 
for(name in colnames(Protein.data.log.scale)){
  x <- str_split(name , pattern = "-")
  X <- paste0(x[[1]][1] ,"-" ,x[[1]][2],"-" , x[[1]][3])
  col_names = c(col_names , X)
}
colnames(Protein.data.log.scale) <- col_names

We saved the normalized and scaled protein matrix as RDATA file

save(Protein.data.log.scale , file = "Protein_data_.RDATA")

Load

load("Protein_data_.RDATA")

#MOFA Further sub-setting for the matrices will be through samples which are corresponding to breast cancer sub-types from the metadata (not adjacent normal samples)

FilteredMetaData <- as.data.frame(metadata %>% filter (IMS != 'Normal' & !is.na(IMS)))
FilteredMetaData <- FilteredMetaData[colnames(miRNA.data.log.scale),]
FilteredMetaData <- FilteredMetaData[complete.cases(FilteredMetaData),]
table(FilteredMetaData$IMS)

We Sub-setted the four matrices and prepared them for MOFA

miRNA <- miRNA.data.log.scale[,row.names(FilteredMetaData)]
miRNA_row.names <- row.names(miRNA)
miRNA <- apply(miRNA,2, as.numeric)
row.names(miRNA) = miRNA_row.names

RNA <- BulkRNA.data.log.scale[,row.names(FilteredMetaData)]
RNA_row.names <- row.names(RNA)
RNA <- apply(RNA,2, as.numeric)
row.names(RNA) = RNA_row.names

lncRNA <- lncRNA.data.log.scale[,row.names(FilteredMetaData)]
lncRNA_row.names <- row.names(lncRNA)
lncRNA <- apply(lncRNA,2, as.numeric)
row.names(lncRNA) = lncRNA_row.names

Protein <- Protein.data.log.scale[,row.names(FilteredMetaData)]
Protein_row.names <- row.names(Protein)
Protein <- apply(Protein,2, as.numeric)
row.names(Protein) = Protein_row.names
library(MOFA2)
library(MOFAdata)
library(data.table)

Load the MOFA Object

load('MOFA_BRCA.RDATA')

upload the matrices to a list

Breast_data <- list(Genes = RNA,
                    miRNA = miRNA,
                    lncRNA = lncRNA,
                    Protein = Protein)

Create and visualize MOFA object

MOFAobject <- create_mofa(Breast_data)
plot_data_overview(MOFAobject)

Select MOFA parameters for the model (30 factors are selected after trying different numbers as the best estimaed variance by each omic)

data_opts <- get_default_data_options(MOFAobject)
model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 30
train_opts <- get_default_training_options(MOFAobject)
train_opts$convergence_mode <- "slow"
train_opts$seed <- 42
train_opts$maxiter <- 1000
MOFAobject <- prepare_mofa(MOFAobject,
                           data_options = data_opts,
                           model_options = model_opts,
                           training_options = train_opts
)
MOFAobject <- run_mofa(MOFAobject)

Add metadata to the MOFA Object

Total_pheno <- metadata[row.names(FilteredMetaData),]
Total_pheno$sample <- Total_pheno$bcr_patient_barcode
samples_metadata(MOFAobject) <- Total_pheno

Save the MOFA object

save(MOFAobject , file = "MOFA_BRCA.RDATA")

Plot Factor correlation

plot_factor_cor(MOFAobject)

Non of the 30 factors are higly correlated to each other

Plot variance ration explaind by each factor accros the four omics

plot_variance_explained(MOFAobject, max_r2=10)

Factors 1 and 3 explain the variance using the four omics with the highest ration for Genes and lncRNA. while factor 2 is specific for Genes and lncRNA features. Factors 4 and 5 only explains variance bwteen samples by using the Protein features. other factors explains lower ratios.

Variance_Explained <-  MOFAobject@cache$variance_explained$r2_per_factor
Variance_Explained <- as.data.frame(Variance_Explained)
Variance_Explained
##          group1.Genes group1.miRNA group1.lncRNA group1.Protein
## Factor1  12.016674731   2.56949929   9.888162736     5.47541969
## Factor2  12.708122032   0.41444172   5.629247491     0.69203583
## Factor3   6.806844119   2.13510060   3.611907712     2.83954086
## Factor4   0.001782687   0.00173897   0.001106563    14.62200619
## Factor5   0.045955985   0.18939580   0.046665365    10.62746466
## Factor6   4.416843067   1.00028579   3.001321935     1.53409754
## Factor7   2.154266082   0.77097387   1.755995737     1.86625015
## Factor8   2.520651096   0.57101505   2.016878282     0.99117344
## Factor9   1.966321466   0.55410141   1.464950856     0.68734177
## Factor10  2.749176068   0.18430931   1.207922621     0.16068761
## Factor11  1.595826453   0.47570821   1.523711294     0.68442115
## Factor12  1.059913447   0.65292315   0.849573109     1.51030911
## Factor13  1.389206660   0.36780472   1.893903902     0.37126287
## Factor14  0.032076359   2.48546806   0.031600010     0.39594611
## Factor15  0.616893772   0.84118401   0.457211354     0.82724650
## Factor16  1.062840268   0.27068094   0.923513367     0.37651265
## Factor17  1.840139533   0.12602967   0.476650149     0.17629967
## Factor18  0.362660524   0.17904999   1.817674746     0.22367164
## Factor19  1.118038532   0.14845793   0.841035482     0.42597099
## Factor20  0.562255765   0.96154716   0.569168003     0.27639691
## Factor21  0.710417102   0.24845109   0.849524529     0.49881563
## Factor22  0.719500944   0.37399960   0.596665642     0.38498971
## Factor23  1.003499944   0.21267797   0.651121724     0.18532291
## Factor24  0.842638748   0.14746152   0.759709395     0.26894959
## Factor25  0.755931575   0.21840670   0.614697301     0.39581000
## Factor26  0.859126156   0.22593288   0.730424815     0.11567401
## Factor27  1.098277816   0.18818156   0.327304423     0.14161400
## Factor28  0.608015270   0.32159671   0.624949379     0.18530124
## Factor29  0.635437267   0.28986551   0.679502876     0.08282037
## Factor30  0.579915085   0.18822674   0.535301945     0.37484299
plot_factors(MOFAobject, 
             factors = 1:3,
             color_by = 'IMS'
)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

plot_variance_explained(MOFAobject, plot_total = T)[[2]]

As total variance explained, Genes and priteins explain the highest variance among the selected samples.

plot_factor(MOFAobject, 
            factors = c(1,2,3,4), 
            color_by = "IMS"
)
## Warning: `fct_explicit_na()` was deprecated in forcats 1.0.0.
## ℹ Please use `fct_na_value_to_level()` instead.
## ℹ The deprecated feature was likely used in the MOFA2 package.
##   Please report the issue at <https://github.com/bioFAM/MOFA2>.

Factor 1 explains the variance between the Basal breast cancer subtype and the Luminal ones. while other factors cannot explaint the variance between the selected subtypes.

plot_factor(MOFAobject, 
            factors = c(5,6,7,8), 
            color_by = "IMS"
)

plot_data_heatmap(MOFAobject, 
                  view = "miRNA",
                  factor = 1,  
                  features = 100,
                  cluster_rows = TRUE, cluster_cols = TRUE,
                  show_rownames = FALSE, show_colnames = FALSE,
                  scale = "none"
)

plot_data_heatmap(MOFAobject, 
                  view = "Genes",
                  factor = 1,  
                  features = 50,
                  cluster_rows = FALSE, cluster_cols = TRUE,
                  show_rownames = TRUE, show_colnames = TRUE,
                  scale = "none"
)

plot_data_heatmap(MOFAobject, 
                  view = "Protein",
                  factor = 1,  
                  features = 50,
                  cluster_rows = FALSE, cluster_cols = TRUE,
                  show_rownames = TRUE, show_colnames = TRUE,
                  scale = "none"
)

plot_data_heatmap(MOFAobject, 
                  view = "lncRNA",
                  factor = 1,  
                  features = 50,
                  cluster_rows = FALSE, cluster_cols = TRUE,
                  show_rownames = TRUE, show_colnames = TRUE,
                  scale = "none"
)

Total_pheno <- as.data.frame(MOFAobject@samples_metadata)
colnames(MOFAobject@samples_metadata)
##  [1] "X"                                   "bcr_patient_barcode"                
##  [3] "type"                                "age_at_initial_pathologic_diagnosis"
##  [5] "gender"                              "race"                               
##  [7] "ajcc_pathologic_tumor_stage"         "clinical_stage"                     
##  [9] "histological_type"                   "histological_grade"                 
## [11] "initial_pathologic_dx_year"          "menopause_status"                   
## [13] "birth_days_to"                       "vital_status"                       
## [15] "tumor_status"                        "last_contact_days_to"               
## [17] "death_days_to"                       "cause_of_death"                     
## [19] "new_tumor_event_type"                "new_tumor_event_site"               
## [21] "new_tumor_event_site_other"          "new_tumor_event_dx_days_to"         
## [23] "treatment_outcome_first_course"      "margin_status"                      
## [25] "residual_tumor"                      "OS"                                 
## [27] "OS.time"                             "DSS"                                
## [29] "DSS.time"                            "DFI"                                
## [31] "DFI.time"                            "PFI"                                
## [33] "PFI.time"                            "Redaction"                          
## [35] "IMS"                                 "ethnicity"                          
## [37] "Ethnicity_reported"                  "Ethnicity_reported_simple"          
## [39] "ICRscore"                            "HML.ICR.Cluster"                    
## [41] "ICR_ES"                              "IMS_Mathews"                        
## [43] "Assigned_Ethnicity"                  "Assigned_Ethnicity_simplified"      
## [45] "sample"                              "group"
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:IRanges':
## 
##     desc
## The following object is masked from 'package:stats':
## 
##     filter

Statistical Anova test shows that factor one also significantly correlated with the race of the samples which may affect the variance explained by factor 1 due to biological differences (Can be better visualized)

Zmatrix <- as.data.frame(MOFAobject@expectations$Z)
for(i in 1){
  df <- data.frame(factor = as.numeric(Zmatrix[,i]) , obj = Total_pheno[,"race"])
  Ttest <- df %>% anova_test(factor~obj)
  print(Ttest)
}
## ANOVA Table (type II tests)
## 
##   Effect DFn DFd      F     p p<.05   ges
## 1    obj   3 491 12.735 5e-08     * 0.072

Anova Test shown significant correlation between race and samples values in Factor1

Zmatrix <- as.data.frame(MOFAobject@expectations$Z)
for(i in 1){
  df <- data.frame(factor = as.numeric(Zmatrix[,i]) , obj = Total_pheno[,"IMS"])
  Ttest <- df %>% anova_test(factor~obj)
  print(Ttest)
}
## ANOVA Table (type II tests)
## 
##   Effect DFn DFd       F         p p<.05   ges
## 1    obj   3 491 573.403 5.59e-160     * 0.778

Anova Test shown significant correlation between breast cancer subtypes and samples values in Factor1

MOFAobject@samples_metadata$age_diagnosis <-
MOFAobject@samples_metadata$age_at_initial_pathologic_diagnosis
correlate_factors_with_covariates(MOFAobject, 
                                  covariates = c("age_diagnosis","birth_days_to","last_contact_days_to"), 
                                  #plot="log_pval"
                                  plot = "r"
)
## Warning in correlate_factors_with_covariates(MOFAobject, covariates =
## c("age_diagnosis", : There are non-numeric values in the covariates data.frame,
## converting to numeric...

correlate_factors_with_covariates(MOFAobject, 
                                  covariates = c("age_diagnosis","birth_days_to","last_contact_days_to" ), 
                                  plot="log_pval"
                                  #plot = "r"
)
## Warning in correlate_factors_with_covariates(MOFAobject, covariates =
## c("age_diagnosis", : There are non-numeric values in the covariates data.frame,
## converting to numeric...

Factor1 samples are slightly correlated with days to birth and age at diagnosis

p <- plot_factors(MOFAobject, 
                  factors = c(1,3), 
                  color_by = "IMS",
                  dot_size = 2.5,
                  show_missing = T
)

p <- p + 
  #geom_hline(yintercept=0.1, linetype="dashed") +
  geom_vline(xintercept=(-0.2), linetype="dashed")+
  geom_vline(xintercept=(1.2), linetype="dashed")

print(p)

Factor1 against Factor3 reveals that Factor1 can also explain variance between Basal and Her2 subtypes for future analysis

#Genes Results

plot_top_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 20,
 sign = 'positive',# Top number of features to highlight
 scale = T           # Scale weights from -1 to 1
)

plot_top_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 20, 
 sign = "negative",
 scale = F           # Scale weights from -1 to 1
)

#miRNA Results

plot_top_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 20,
 sign = 'positive',# Top number of features to highlight
 scale = T           # Scale weights from -1 to 1
)

plot_top_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 20, 
 sign = "negative",
 scale = T           # Scale weights from -1 to 1
)

#Protein Results

plot_top_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 20,
 sign = 'positive',# Top number of features to highlight
 scale = T           # Scale weights from -1 to 1
)

plot_top_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 20, 
 sign = "negative",
 scale = T           # Scale weights from -1 to 1
)

#Enrichment

library(biomaRt)
RNA_Features <- features_names(MOFAobject)[["Genes"]]
lncRNA_Features <- features_names(MOFAobject)[["lncRNA"]]
miRNA_Features <- features_names(MOFAobject)[["miRNA"]]
Protein_Features <- features_names(MOFAobject)[["Protein"]]
utils::data(reactomeGS)
utils::data("MSigDB_v6.0_C2_human") 

#Genes - MOFA - Reactome

Gene_ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
attributes<- listAttributes (Gene_ensembl)
Genes_ensembl_IDs <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), 
                     filters = "external_gene_name", 
                     values = RNA_Features, 
                     mart = Gene_ensembl)
indexes <- match(colnames(reactomeGS) , Genes_ensembl_IDs$ensembl_gene_id)
dim(reactomeGS)
sum(!is.na(indexes))
indexes <- !is.na(indexes)
reactome <- reactomeGS[,indexes]
indexes <- match(colnames(reactome) , Genes_ensembl_IDs$ensembl_gene_id)
col_names = Genes_ensembl_IDs$external_gene_name[indexes]
colnames(reactome) <- col_names
save(reactome , file = "reactomeDB.RDATA")
load("reactomeDB.RDATA")

#MOFA - Genes

# GSEA on positive weights, with default options
res.positive <- run_enrichment(MOFAobject, 
  feature.sets = reactome,  
  view = "Genes",
  sign = "positive"
)

# GSEA on negative weights, with default options
res.negative <- run_enrichment(MOFAobject, 
  feature.sets = reactome, 
  view = "Genes",
  sign = "negative"
)
plot_enrichment(res.positive, 
  factor = 1, 
  max.pathways = 15,
  alpha = 0.25
)

plot_enrichment(res.negative, 
  factor = 1, 
  max.pathways = 15,
  alpha = 0.25
)

#Final Excel Sheet

OUT <- createWorkbook()

#Genes - GSEA - Software

#Prpare the gene set 
GSEA_input <- as.data.frame(MOFAobject@expectations$W$Genes[,'Factor1'])
colnames(GSEA_input) <- 'Weights'
#Rename the Genes to remove _Genes
indexes<- which(grepl(pattern = "_Genes" , x = row.names(GSEA_input) ))
Genes_toBeReplaced <- row.names(GSEA_input)[indexes]
Genes_toBeReplaced <- gsub(pattern = "_Genes" , "" , Genes_toBeReplaced)
row.names(GSEA_input)[indexes] <- Genes_toBeReplaced
#Add Genes Column
GSEA_input$Genes <- row.names(GSEA_input)
GSEA_input <- GSEA_input[,c('Genes' , 'Weights')]
GSEA_input <- GSEA_input %>% arrange(desc(Weights))
write.table(GSEA_input , sep = "\t",file = 'GSEA_input.rnk' , quote = FALSE , row.names = FALSE)

Reactome

GSEA_positive_Output_Reactome <- read.delim("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/gsea_report_for_na_pos_1679677635779.tsv") %>% filter(NOM.p.val <= 0.05)%>% dplyr::arrange(NOM.p.val) %>% dplyr::select(NAME , NOM.p.val) 

ggplot(GSEA_positive_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

addWorksheet(OUT , 'GSEA_positive_Output_Reactome')
writeData(OUT, sheet = "GSEA_positive_Output_Reactome", x = GSEA_positive_Output_Reactome)

Select Pathways

df <- GSEA_positive_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_INTERFERON_GAMMA_SIGNALING" ,
                       "REACTOME_KERATINIZATION",
                       "REACTOME_ANTIGEN_PROCESSING_CROSS_PRESENTATION",
                       "REACTOME_NEUTROPHIL_DEGRANULATION",
                       "REACTOME_INFLUENZA_INFECTION",
                       "REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM",
                       "REACTOME_COLLAGEN_DEGRADATION",
                       "REACTOME_DEGRADATION_OF_THE_EXTRACELLULAR_MATRIX",
                       "REACTOME_CELL_CYCLE_MITOTIC",
                       "REACTOME_SARS_COV_2_MODULATES_HOST_TRANSLATION_MACHINERY",
                       "REACTOME_SARS_COV_1_MODULATES_HOST_TRANSLATION_MACHINERY",
                       "REACTOME_DETOXIFICATION_OF_REACTIVE_OXYGEN_SPECIES",
                       "REACTOME_SARS_COV_2_HOST_INTERACTIONS",
                       "REACTOME_HOST_INTERACTIONS_OF_HIV_FACTORS")
indexes <- match(selected_pathways , df[,'NAME'])
GSEA_positive_Output_Reactome_Selected <- df[indexes,]

ggplot(GSEA_positive_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

GSEA_negative_Output_Reactome <- read.delim("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/gsea_report_for_na_neg_1679677635779.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(GSEA_negative_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

addWorksheet(OUT , 'GSEA_negative_Output_Reactome')
writeData(OUT, sheet = "GSEA_negative_Output_Reactome", x = GSEA_negative_Output_Reactome)

Select Pathways

df <- GSEA_negative_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_SPHINGOLIPID_METABOLISM" ,
                       "REACTOME_PI3K_AKT_SIGNALING_IN_CANCER",
                       "REACTOME_METABOLISM_OF_LIPIDS",
                       "REACTOME_NEGATIVE_REGULATION_OF_THE_PI3K_AKT_NETWORK")
indexes <- match(selected_pathways , df[,'NAME'])
GSEA_negative_Output_Reactome_Selected <- df[indexes,]

ggplot(GSEA_negative_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

C5goBp

GSEA_positive_Output_C5 <- read.delim("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/gsea_report_for_na_pos_1679677560989.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(GSEA_positive_Output_C5 , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('GO Biological Processes')

addWorksheet(OUT , 'GSEA_positive_Output_C5')
writeData(OUT, sheet = "GSEA_positive_Output_C5", x = GSEA_positive_Output_C5)

Select Biological Processes

df <- GSEA_positive_Output_C5
colm <- "NAME"
selected_BP <- c("GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE" ,
                 "GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON",
                 "GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION",
                 "GOBP_HUMORAL_IMMUNE_RESPONSE",
                 "GOBP_REGULATION_OF_SUBSTRATE_ADHESION_DEPENDENT_CELL_SPREADING",
                 "GOBP_CYTOKINE_PRODUCTION",
                 "GOBP_DEFENSE_RESPONSE_TO_OTHER_ORGANISM",
                 "GOBP_INNATE_IMMUNE_RESPONSE",
                 "GOBP_RESPONSE_TO_MOLECULE_OF_BACTERIAL_ORIGIN",
                 "GOBP_INFLAMMATORY_RESPONSE",
                 "GOBP_POSITIVE_REGULATION_OF_IMMUNE_SYSTEM_PROCESS",
                 "GOBP_CELL_ADHESION",
                 "GOBP_ANTIMICROBIAL_HUMORAL_IMMUNE_RESPONSE_MEDIATED_BY_ANTIMICROBIAL_PEPTIDE",
                 "GOBP_COLLAGEN_METABOLIC_PROCESS",
                 "GOBP_TUMOR_NECROSIS_FACTOR_SUPERFAMILY_CYTOKINE_PRODUCTION",
                 "GOBP_POSITIVE_REGULATION_OF_CYTOKINE_PRODUCTION",
                 "GOBP_DEFENSE_RESPONSE_TO_GRAM_POSITIVE_BACTERIUM",
                 "GOBP_EXTRACELLULAR_MATRIX_DISASSEMBLY",
                 "GOBP_DNA_REPAIR")
indexes <- match(selected_BP , df[,colm])
GSEA_positive_Output_C5_Selected <- df[indexes,]

ggplot(GSEA_positive_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

GSEA_negative_Output_C5 <- read.delim("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/gsea_report_for_na_neg_1679677560989.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(GSEA_negative_Output_C5 , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 4)) +
  xlab('GO Biological Processes')

addWorksheet(OUT , 'GSEA_negative_Output_C5')
writeData(OUT, sheet = "GSEA_negative_Output_C5", x = GSEA_negative_Output_C5)

Select Biological Processes

df <- GSEA_negative_Output_C5
colm <- "NAME"
selected_BP <- c("GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS" ,
                 "GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT",
                 "GOBP_SPHINGOLIPID_BIOSYNTHETIC_PROCESS",
                 "GOBP_LIPID_MODIFICATION",
                 "GOBP_MEMBRANE_LIPID_METABOLIC_PROCESS",
                 "GOBP_STEROID_BIOSYNTHETIC_PROCESS",
                 "GOBP_SPHINGOLIPID_METABOLIC_PROCESS",
                 "GOBP_LIPID_METABOLIC_PROCESS",
                 "GOBP_LIPID_OXIDATION",
                 "GOBP_STEROID_METABOLIC_PROCESS",
                 "GOBP_LIPID_BIOSYNTHETIC_PROCESS",
                 "GOBP_REGULATION_OF_NEUROTRANSMITTER_LEVELS",
                 "GOBP_CELLULAR_LIPID_METABOLIC_PROCESS",
                 "GOBP_GLANDULAR_EPITHELIAL_CELL_DIFFERENTIATION",
                 "GOBP_FATTY_ACID_METABOLIC_PROCESS",
                 "GOBP_NEGATIVE_REGULATION_OF_TRANSMEMBRANE_TRANSPORT",
                 "GOBP_HORMONE_TRANSPORT",
                 "GOBP_REGULATION_OF_LIPID_METABOLIC_PROCESS")
indexes <- match(selected_BP , df[,colm])
GSEA_negative_Output_C5_Selected <- df[indexes,]

ggplot(GSEA_negative_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 7)) +
  xlab('Reactome Pathways')

#Proteins - GSEA - Software

#Prpare the gene set 
Protein_input <- as.data.frame(MOFAobject@expectations$W$Protein[,'Factor1'])
colnames(Protein_input) <- 'Weights'
#Rename the Genes to remove _Genes
indexes<- which(grepl(pattern = "_Protein" , x = row.names(Protein_input) ))
Protein_toBeReplaced <- row.names(Protein_input)[indexes]
Protein_toBeReplaced <- gsub(pattern = "_Protein" , "" , Protein_toBeReplaced)
row.names(Protein_input)[indexes] <- Protein_toBeReplaced
#Add Genes Column
Protein_input$Proteins <-row.names(Protein_input)
Protein_input <- Protein_input[,c('Proteins' , 'Weights')]
Protein_input <- Protein_input %>% arrange(desc(Weights))
write.table(Protein_input , sep = "\t",file = 'Protein_input.rnk' , quote = FALSE , row.names = FALSE)

Reactome

Protein_positive_Output_Reactome <- read.delim("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/gsea_report_for_na_pos_1679677623172.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(Protein_positive_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

addWorksheet(OUT , 'Protein_pos_Reactome')
writeData(OUT, sheet = "Protein_pos_Reactome", x = Protein_positive_Output_Reactome)

Select Pathways

df <- Protein_positive_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_CELL_CYCLE_MITOTIC",
                       "REACTOME_CELL_CYCLE",
                       "REACTOME_CELL_CYCLE_CHECKPOINTS",
                       "REACTOME_MITOTIC_G1_PHASE_AND_G1_S_TRANSITION")
indexes <- match(selected_pathways , df[,'NAME'])
Protein_positive_Output_Reactome_Selected <- df[indexes,]

ggplot(Protein_positive_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5)) +
  xlab('Reactome Pathways')

Protein_negative_Output_Reactome <- read.delim("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/gsea_report_for_na_neg_1679677623172.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val) 

ggplot(Protein_negative_Output_Reactome , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8)) +
  xlab('Reactome Pathways')

addWorksheet(OUT , 'Protein_neg_Reactome')
writeData(OUT, sheet = "Protein_neg_Reactome", x = Protein_negative_Output_Reactome)

Select Pathways

df <- Protein_negative_Output_Reactome
colm <- "NAME"
selected_pathways <- c("REACTOME_SIGNALING_BY_INSULIN_RECEPTOR",
                       "REACTOME_INSULIN_RECEPTOR_SIGNALLING_CASCADE")
indexes <- match(selected_pathways , df[,'NAME'])
Protein_negative_Output_Reactome_Selected <- df[indexes,]

ggplot(Protein_negative_Output_Reactome_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 7)) +
  xlab('Reactome Pathways')

C5goBp

Protein_positive_Output_C5 <- read.delim("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/gsea_report_for_na_pos_1679677577693.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(Protein_positive_Output_C5 , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8)) +
  xlab('GO Biological Processes')

addWorksheet(OUT , 'Protein_pos_C5')
writeData(OUT, sheet = "Protein_pos_C5", x = Protein_positive_Output_C5)

Select Biological Processes

df <- Protein_positive_Output_C5
colm <- "NAME"
selected_BP <- c('GOBP_MITOTIC_CELL_CYCLE_PROCESS',
                 'GOBP_RESPONSE_TO_BACTERIUM',
                 'GOBP_RESPONSE_TO_CYTOKINE',
                 'GOBP_PHAGOCYTOSIS',
                 'GOBP_MITOTIC_CELL_CYCLE_CHECKPOINT_SIGNALING',
                 'GOBP_RESPONSE_TO_MOLECULE_OF_BACTERIAL_ORIGIN',
                 'GOBP_LYMPHOCYTE_APOPTOTIC_PROCESS',
                 'GOBP_REGULATION_OF_CELL_CELL_ADHESION',
                 'GOBP_B_CELL_ACTIVATION',
                 'GOBP_CYTOKINE_MEDIATED_SIGNALING_PATHWAY',
                 'GOBP_T_CELL_PROLIFERATION',
                 'GOBP_NEGATIVE_REGULATION_OF_PROTEIN_SERINE_THREONINE_KINASE_ACTIVITY',
                 'GOBP_REGULATION_OF_CELL_CYCLE_G2_M_PHASE_TRANSITION',
                 'GOBP_CELL_CELL_ADHESION',
                 'GOBP_POSITIVE_REGULATION_OF_I_KAPPAB_KINASE_NF_KAPPAB_SIGNALING')
indexes <- match(selected_BP , df[,colm])
Protein_positive_Output_C5_Selected <- df[indexes,]

ggplot(Protein_positive_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 7)) +
  xlab('Reactome Pathways')

Protein_negative_Output_C5 <- read.delim("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/gsea_report_for_na_neg_1679677577693.tsv") %>% filter(NOM.p.val <= 0.05) %>% dplyr::select(NAME , NOM.p.val)

ggplot(Protein_negative_Output_C5 , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 6)) +
  xlab('GO Biological Processes')

addWorksheet(OUT , 'Protein_neg_C5')
writeData(OUT, sheet = "Protein_neg_C5", x = Protein_negative_Output_C5)

Select Biological Processes

df <- Protein_negative_Output_C5
colm <- "NAME"
selected_BP <- c('GOBP_INSULIN_LIKE_GROWTH_FACTOR_RECEPTOR_SIGNALING_PATHWAY',
                 'GOBP_MAMMARY_GLAND_DEVELOPMENT',
                 'GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT',
                 'GOBP_RESPONSE_TO_STEROID_HORMONE',
                 'GOBP_LIPID_BIOSYNTHETIC_PROCESS',
                 'GOBP_CELLULAR_RESPONSE_TO_INSULIN_STIMULUS',
                 'GOBP_LIPID_METABOLIC_PROCESS',
                 'GOBP_INOSITOL_LIPID_MEDIATED_SIGNALING')
indexes <- match(selected_BP , df[,colm])
Protein_negative_Output_C5_Selected <- df[indexes,]

ggplot(Protein_negative_Output_C5_Selected , aes(x = reorder(NAME , - NOM.p.val)  , y = -log(NOM.p.val + 0.5) , fill = -log(NOM.p.val + 0.5))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 7)) +
  xlab('Reactome Pathways')

#miRNA Overrepresentation Grouping by MOFA factor

cluster_f1<- cluster_samples(object = MOFAobject, factors = 1, k = 2)
cluster_f1_a<- as.data.frame (cluster_f1[["cluster"]])
colnames(cluster_f1_a)<- "group"
 Z_data<- get_expectations(MOFAobject, variable = "Z")

# rename groups
cluster_f1_a$groups <- ifelse(cluster_f1_a$group==  1, "Negative", "Positive")

Expression matrix

mi_indx<- match (rownames(cluster_f1_a), colnames(miRNA.data))
mi_mtrx<- miRNA.data[miRNA_Features, mi_indx]
cond1="Negative" 
cond2="Positive"
# perform deseq analysis
dds = DESeqDataSetFromMatrix( countData = mi_mtrx, colData = cluster_f1_a , design = ~ groups )
dds.run = DESeq(dds)

res=results(dds.run, contrast = c("groups" ,cond2, cond1) )
# remove nulls
res=res[complete.cases(res), ]
summary(res)
mi_res.df=as.data.frame(res)
write.table(mi_res.df, file = "res_mi_RNA.txt")
 ## 3. Selection for DEMS
dems.res_up =mi_res.df[mi_res.df$padj< 0.05 & mi_res.df$log2FoldChange > 0.5,]
dim(dems.res_up)
dems.up=dems.res_up[order(dems.res_up$padj), ]
dems_up.=rownames(dems.up)
dems.res_dwn =mi_res.df[mi_res.df$padj< 0.05 & mi_res.df$log2FoldChange < -0.5,]
dim(dems.res_dwn)
dems.dwn=dems.res_dwn[order(dems.res_dwn$padj), ]
dems.d=rownames(dems.res_dwn)
#  export them to a miRNA list to start the miRNA over representation analysis.
save( dems_up., dems.d , mi_res.df, file= "mi_res_dems.RDATA")
write.table(dems_up., file = "mi_dems_up.txt", quote = FALSE , row.names = FALSE, col.names =  FALSE)
write.table(dems.d, file = "mi_dems_dwn.txt", quote = FALSE , row.names = FALSE, col.names =  FALSE)

TAM2-UP - Visualization

TAM2_OUTPUT_UP <- read.delim("miRNA_UP_TAM2.txt")
table(TAM2_OUTPUT_UP$Category)
## 
##     Cell Specificity              Cluster              Disease 
##                  101                   41                  420 
##               Family             Function   Tissue Specificity 
##                   29                  105                   21 
## Transcription Factor 
##                   49
TAM2_OUTPUT_UP_Function <- TAM2_OUTPUT_UP %>% filter(Category == 'Function') %>% dplyr::select(Term , FDR) %>% arrange(FDR)%>% filter(FDR < 0.05)

ggplot(TAM2_OUTPUT_UP_Function , aes(x = reorder(Term , -FDR)  , y = -log(FDR)  , fill = -log(FDR))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'TAM2_OUTPUT_UP_Function')
writeData(OUT, sheet = "TAM2_OUTPUT_UP_Function", x = TAM2_OUTPUT_UP_Function)

Select Functions

df <- TAM2_OUTPUT_UP_Function
colm <- "Term"
selected_Fn <- c("Immune Response",
                 "Inflammation",
                 "Immune System(Xiao's Cell2010)",
                 "Cell Proliferation",
                 "Cell Cycle",
                 "Regulation of Akt Pathway",
                 "Response to Hypoxia",
                 "T-Cell Activation",
                 "Neutrophil Differentiation")
indexes <- match(selected_Fn , df[,colm])
TAM2_OUTPUT_UP_Function_Selected <- df[indexes,]

ggplot(TAM2_OUTPUT_UP_Function_Selected , aes(x = reorder(Term , -FDR)  , y = -log(FDR)  , fill = -log(FDR))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

TAM2-DOWN - Visualization

TAM2_OUTPUT_DOWN <- read.delim("miRNA_DOWN_TAM2.txt")
table(TAM2_OUTPUT_DOWN$Category)
## 
##     Cell Specificity              Cluster              Disease 
##                   68                   30                  348 
##               Family             Function   Tissue Specificity 
##                   23                   92                   17 
## Transcription Factor 
##                   37
TAM2_OUTPUT_DOWN_Function <- TAM2_OUTPUT_DOWN %>% filter(Category == 'Function') %>% dplyr::select(Term , FDR) %>% arrange(FDR)%>% filter(FDR < 0.05)

ggplot(TAM2_OUTPUT_DOWN_Function , aes(x = reorder(Term , -FDR)  , y = -log(FDR)  , fill = -log(FDR))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'TAM2_OUTPUT_DOWN_Function')
writeData(OUT, sheet = "TAM2_OUTPUT_DOWN_Function", x = TAM2_OUTPUT_DOWN_Function)

Select Functions

df <- TAM2_OUTPUT_DOWN_Function
colm <- "Term"
selected_Fn <- c("Cell Cycle",
                 "Cell Differentiation",
                 "Epithelial-to-Mesenchymal Transition",
                 "Adipocyte Differentiation",
                 "Lipid Metabolism",
                 "Cell Proliferation",
                 "T-Cell Differentiation",
                 "Extracellular Matrix Remodeling",
                 "Inflammation",
                 "Hormone-mediated Signaling Pathway ")
indexes <- match(selected_Fn , df[,colm])
TAM2_OUTPUT_DOWN_Function_Selected <- df[indexes,]

ggplot(TAM2_OUTPUT_DOWN_Function_Selected , aes(x = reorder(Term , -FDR)  , y = -log(FDR)  , fill = -log(FDR))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

#lncRNA Overrepresentation

#Prpare the gene set 
lnc_input <- as.data.frame(MOFAobject@expectations$W$lncRNA[,'Factor1'])
colnames(lnc_input) <- 'Weights'
lnc_input <- lnc_input %>% arrange(desc(Weights))
lnc_input$probe_ID <- row.names(lnc_input)
lnc_input <- lnc_input[,c('probe_ID','Weights')]
write.table(lnc_input , file = "lnc_arranged.tsv" , quote = FALSE , row.names = FALSE , col.names = TRUE)

expression matrix

ln_indx<- match (rownames(cluster_f1_a), colnames(lncRNA.data))
lnc_mtrx<- lncRNA.data[lncRNA_Features, ln_indx]
lnc_mtrx <- lnc_mtrx[complete.cases(lnc_mtrx),]
cond1="Negative" 
cond2="Positive"
# perform deseq analysis
dds = DESeqDataSetFromMatrix( countData = lnc_mtrx, colData = cluster_f1_a , design = ~ groups )
dds.run = DESeq(dds)

res=results(dds.run, contrast = c("groups",cond2 ,cond1) )
# remove nulls
res=res[complete.cases(res), ]
summary(res)
res.df=as.data.frame(res)
write.table(res.df, file = "res_lncRNA.txt")
 ## 3. Selection for DEMS

lnc.res_up =res.df[res.df$padj< 0.05 & res.df$log2FoldChange > 1,]
dim(lnc.res_up)
lnc.up=lnc.res_up[order(lnc.res_up$padj), ]
lnc_up.=rownames(lnc.up)
lnc.res_dwn =res.df[res.df$padj< 0.05 & res.df$log2FoldChange < -1,]
dim(lnc.res_dwn)
lnc.dwn=lnc.res_dwn[order(lnc.res_dwn$padj), ]
lnc.d=rownames(lnc.res_dwn)
#  export them to a miRNA list to start the miRNA over representation analysis.
save( lnc_up., lnc.d,res.df, file= "res_lnc.RDATA")
write.table(lnc_up., file = "lnc_up.txt", quote = FALSE , row.names = FALSE, col.names =  FALSE)
write.table(lnc.d, file = "lnc.d.txt", quote = FALSE , row.names = FALSE, col.names =  FALSE)

lnc_Enrichment Visualization

Panther - Reactome - UP

Panther_Reactome_UP <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/panter_lnc_reacto_up.txt", 
    delim = "\t", escape_double = FALSE, 
    trim_ws = TRUE, skip = 11)
## Rows: 32 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): Reactome pathways, Client Text Box Input (over/under)
## dbl (6): Homo sapiens - REFLIST (20589), Client Text Box Input (251), Client...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_Reactome_UP)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_Reactome_UP <- Panther_Reactome_UP %>% dplyr::select(`Reactome pathways` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.05)

ggplot(Panther_Reactome_UP , aes(x = reorder(`Reactome pathways` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'Panther_Reactome_UP')
writeData(OUT, sheet = "Panther_Reactome_UP", x = Panther_Reactome_UP)

Select Reactome Pathways

df <- Panther_Reactome_UP
selected_Fn <- c("Nucleosome assembly (R-HSA-774815)",
                 "Deposition of new CENPA-containing nucleosomes at the centromere (R-HSA-606279)",
                 "Packaging Of Telomere Ends (R-HSA-171306)",
                 "DNA methylation (R-HSA-5334118)",
                 "Depurination (R-HSA-73927)",
                 "Activated PKN1 stimulates transcription of AR (androgen receptor) regulated genes KLK2 and KLK3 (R-HSA-5625886)",
                 "Cleavage of the damaged purine (R-HSA-110331)",
                 "Recognition and association of DNA glycosylase with site containing an affected purine (R-HSA-110330)",
                 "SIRT1 negatively regulates rRNA expression (R-HSA-427359)",
                 "Assembly of the ORC complex at the origin of replication (R-HSA-68616)",
                 "Activation of HOX genes during differentiation (R-HSA-5619507)")
indexes <- match(selected_Fn , df$`Reactome pathways`)
Panther_Reactome_UP_Selected <- df[indexes,]

ggplot(Panther_Reactome_UP_Selected , aes(x = reorder(`Reactome pathways` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 5))

Panther - Reactome - DOWN

Panther_Reactome_DOWN <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/panther_lnc_dwn_reactome.txt", 
    delim = "\t", escape_double = FALSE, 
    trim_ws = TRUE, skip = 11)
## Rows: 2493 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): Reactome pathways, Client Text Box Input (over/under), Client Text ...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (228), Client...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_Reactome_DOWN)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_Reactome_DOWN <- Panther_Reactome_DOWN %>% dplyr::select(`Reactome pathways` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.05)

ggplot(Panther_Reactome_DOWN , aes(x = reorder(`Reactome pathways` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'Panther_Reactome_DOWN')
writeData(OUT, sheet = "Panther_Reactome_DOWN", x = Panther_Reactome_DOWN)

Select Reactome Pathways

df <- Panther_Reactome_DOWN
selected_Fn <- c("Insulin processing (R-HSA-264876)",
                 "Deubiquitination (R-HSA-5688426)",
                 "RUNX1 regulates transcription of genes involved in WNT signaling (R-HSA-8939256)",
                 "Defective HK1 causes hexokinase deficiency (HK deficiency) (R-HSA-5619056)",
                 "Degradation of the extracellular matrix (R-HSA-1474228)",
                 "Transcriptional Regulation by MECP2 (R-HSA-8986944)",
                 "Insulin processing (R-HSA-264876)",
                 "Interferon alpha/beta signaling (R-HSA-909733)")
indexes <- match(selected_Fn , df$`Reactome pathways`)
Panther_Reactome_DOWN_Selected <- df[indexes,]

ggplot(Panther_Reactome_DOWN_Selected , aes(x = reorder(`Reactome pathways` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

mammary gland epithelial cell proliferation (GO:0033598) hormone metabolic process (GO:0042445) Panther - BP - UP

Panther_BP_UP <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/panther_lncRNA_bp_up.txt", 
    delim = "\t", escape_double = FALSE, 
    trim_ws = TRUE, skip = 11)
## Rows: 15681 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): GO biological process complete, Client Text Box Input (over/under),...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (251), Client...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_BP_UP)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_BP_UP <- Panther_BP_UP %>% dplyr::select(`GO biological process complete` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.01)

ggplot(Panther_BP_UP , aes(x = reorder(`GO biological process complete` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'Panther_BP_UP')
writeData(OUT, sheet = "Panther_BP_UP", x = Panther_BP_UP)

Panther - BP - DOWN

Panther_BP_DOWN <- read_delim("Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/panther_lnc_dwn_GObp.txt", 
    delim = "\t", escape_double = FALSE, 
    trim_ws = TRUE, skip = 11)
## Rows: 15681 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): GO biological process complete, Client Text Box Input (over/under),...
## dbl (5): Homo sapiens - REFLIST (20589), Client Text Box Input (228), Client...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(Panther_BP_DOWN)[c(7 , 8)] <- c('P.val' , 'FDR')
Panther_BP_DOWN <- Panther_BP_DOWN %>% dplyr::select(`GO biological process complete` , P.val) %>% arrange(P.val)%>% filter(P.val < 0.01)

ggplot(Panther_BP_DOWN , aes(x = reorder(`GO biological process complete` , -P.val)  , y = -log(P.val)  , fill = -log(P.val))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'Panther_BP_DOWN')
writeData(OUT, sheet = "Panther_BP_DOWN", x = Panther_BP_DOWN)

Topfun - Enrich - up

topfun_enrich_UP <- read.delim('Enrichment_Results/lnc_Enrichment/lncRNA_UP_erich/toppfun_lnc_up_enrich.txt')
table(topfun_enrich_UP$Category)
## 
##     Coexpression Atlas          Computational               Cytoband 
##                    117                     53                     79 
##                Disease            Gene Family GO: Biological Process 
##                      1                      4                     10 
## GO: Cellular Component GO: Molecular Function            Interaction 
##                      8                      1                     62 
##        Mouse Phenotype                Pathway                 Pubmed 
##                     42                     12                   2137 
##         ToppCell Atlas 
##                    262
topfun_enrich_UP_BP <- topfun_enrich_UP %>% filter(Category == "GO: Biological Process") %>% dplyr::select(Name , p.value)

ggplot(topfun_enrich_UP_BP , aes(x = reorder(`Name` , -p.value)  , y = -log(p.value)  , fill = -log(p.value))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'topfun_enrich_UP_BP')
writeData(OUT, sheet = "topfun_enrich_UP_BP", x = topfun_enrich_UP_BP)

Topfun - Enrich - DOWN

topfun_enrich_DOWN <- read.delim('Enrichment_Results/lnc_Enrichment/lncRNA_DOWN_enrich/topfun_lnc_dwn.txt')
table(topfun_enrich_DOWN$Category)
## 
##           Coexpression     Coexpression Atlas          Computational 
##                      2                     44                     10 
##               Cytoband            Gene Family GO: Biological Process 
##                      3                      1                     25 
##            Interaction                Pathway                 Pubmed 
##                     19                      1                    326 
##         ToppCell Atlas 
##                      5
topfun_enrich_DOWN_BP <- topfun_enrich_DOWN %>% filter(Category == "GO: Biological Process") %>% dplyr::select(Name , p.value)

ggplot(topfun_enrich_DOWN_BP , aes(x = reorder(`Name` , -p.value)  , y = -log(p.value)  , fill = -log(p.value))) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  theme(axis.text  = element_text(size = 8))

addWorksheet(OUT , 'topfun_enrich_DOWN_BP')
writeData(OUT, sheet = "topfun_enrich_DOWN_BP", x = topfun_enrich_DOWN_BP)

#Save Final Results

saveWorkbook(OUT, "Enrichment_Results/Final_Tables/Final_Enrichment.xlsx")
library(survival)
library(survminer)
## Loading required package: ggpubr
## Warning: package 'ggpubr' was built under R version 4.2.2
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma
Total_pheno$OS.time <- as.numeric(Total_pheno$OS.time)
Total_pheno$deceased = Total_pheno$vital_status == "Dead"
Total_pheno$Factor1 <-ifelse(Zmatrix$group1.Factor1 > 0 , "Pos" , "Neg")
fit =survival::survfit(survival::Surv(OS.time,deceased ) ~ Factor1 , data = Total_pheno)
survminer::ggsurvplot(fit, data=Total_pheno,
                                          pval=T,
                                          risk.table=T,
                                          risk.table.col="strata",
                                          title = stringr::str_glue('Level'),
                                          legend.title = "no. of Samples",
                                          xlab = "Time by Days"
                                          )

#Discussion The best MOFA factor was factor (1), which has the most variance contribution from each omic, followed by factor (3). There is no correlation detected between the generated factors. Factor (1) clearly differentiated between breast cancer subtypes. The basal was positive to factor one, while the luminal was negative.

#a. Features positive to factor one The resulted enriched pathways from the four omics were mainly related to four main terms:

(In this section we are representing examples of commonly enriched pathways, biological processes or Functions related to each module by exploring the highly enriched features)

#1. Immune modulation. #mRNA #First Pathway: REACTOME_INTERFERON_GAMMA_SIGNALING

knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_INTERFERON_GAMMA_SIGNALING_241.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('GBP1', 'TRIM29', "GBP5" , 'VCAM1', 'HLA-E', 'HLA-A','SOCS3'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "GBP1",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('GBP1', 'TRIM29', "GBP5" , 'VCAM1', 'HLA-E', 'HLA-A','SOCS3') ,
 scale = F           # Scale weights from -1 to 1
)

#Second Pathway: REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM

knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM_273.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('FSCN1','SOD2_Genes', 'MSN', 'ANXA1_Genes','PSME4', 'TRIM29', 'FLNA', 'VIM'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "FSCN1",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('FSCN1','SOD2_Genes', 'MSN', 'ANXA1_Genes','PSME4', 'TRIM29', 'FLNA', 'VIM') ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#First Biological Process: GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_CELLULAR_RESPONSE_TO_TYPE_II_INTERFERON_99.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('GBP1', 'ACTR3', "CX3CL1" , 'VIM', 'SYNCRIP', 'ASS1','GAPDH_Genes'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "ACTR3",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('GBP1', 'ACTR3', "CX3CL1" , 'VIM', 'SYNCRIP', 'ASS1','GAPDH_Genes') ,
 scale = F           # Scale weights from -1 to 1
)

#Second Biological Process: GOBP_HUMORAL_IMMUNE_RESPONSE

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_HUMORAL_IMMUNE_RESPONSE_119.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('PI3', 'S100A9', "KRT6A" , 'KLK7', 'SLPI', 'GAPDH_Genes','KLK5'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "S100A9",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('PI3', 'S100A9', "KRT6A" , 'KLK7', 'SLPI', 'GAPDH_Genes','KLK5') ,
 scale = F           # Scale weights from -1 to 1
)

#Protein #First Biological Process: GOBP_RESPONSE_TO_CYTOKINE

knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_RESPONSE_TO_CYTOKINE_37.png")

plot_data_scatter(MOFAobject, 
  view = "Protein",
  factor = 1,  
  features = c('LYN', 'SIRPA', "TP53" , 'SYK', 'CD274', 'CD44_Protein','PAX6' , 'KIT_Protein' , "TFRC_Protein"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="Protein expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "LYN",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 1,
 manual = c('LYN', 'SIRPA', "TP53" , 'SYK', 'CD274', 'CD44_Protein','PAX6' , 'KIT_Protein' , "TFRC_Protein") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Second Pathway: REACTOME_CYTOKINE_SIGNALING_IN_IMMUNE_SYSTEM

knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_LYMPHOCYTE_APOPTOTIC_PROCESS_29.png")

plot_data_scatter(MOFAobject, 
  view = "Protein",
  factor = 1,  
  features = c('IDO1_Protein','CASP7', 'AURKB', 'TP53','CHEK2'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="Protein expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "AURKB",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 1,
 manual = c('IDO1_Protein','CASP7', 'AURKB', 'TP53','CHEK2') ,
 scale = F           # Scale weights from -1 to 1
)

#miRNA #First Function: Immune Response

plot_data_scatter(MOFAobject, 
  view = "miRNA",
  factor = 1,  
  features = c('hsa-mir-186','hsa-mir-203a','hsa-mir-362','hsa-mir-9-1',
               'hsa-mir-92a-1','hsa-mir-203b','hsa-mir-146b','hsa-mir-196b',
               'hsa-mir-9-2','hsa-mir-210'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="miRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "hsa-mir-186",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 1,
 manual = c('hsa-mir-186','hsa-mir-203a','hsa-mir-362','hsa-mir-9-1',
               'hsa-mir-92a-1','hsa-mir-203b','hsa-mir-146b','hsa-mir-196b',
               'hsa-mir-9-2','hsa-mir-210') ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Second Function: T-Cell Activation

plot_data_scatter(MOFAobject, 
  view = "miRNA",
  factor = 1,  
  features = c("hsa-mir-223","hsa-mir-146a","hsa-mir-18a","hsa-mir-155","hsa-mir-31"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="miRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "hsa-mir-223",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 1,
 manual = c("hsa-mir-223","hsa-mir-146a","hsa-mir-18a","hsa-mir-155","hsa-mir-31") ,
 scale = F           # Scale weights from -1 to 1
)

#2. Cell cycle and Mitotic division. #mRNA #REACTOME_CELL_CYCLE_MITOTIC

#Protein #First Pathway: REACTOME_CELL_CYCLE_MITOTIC

knitr::include_graphics("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/enplot_REACTOME_CELL_CYCLE_MITOTIC_161.png")

plot_data_scatter(MOFAobject, 
  view = "Protein",
  factor = 1,  
  features = c('CCNE1', 'LYN', "FOXM1" , 'CCNB1', 'CDK1', 'PLK1','AURKB' , 'TP53' , "AURKA"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "CCNE1",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 1,
 manual = c('CCNE1', 'LYN', "FOXM1" , 'CCNB1', 'CDK1', 'PLK1','AURKB' , 'TP53' , "AURKA") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Second Pathway: REACTOME_CELL_CYCLE_CHECKPOINTS

knitr::include_graphics("Enrichment_Results/Protein_Reactome.GseaPreranked.1679677623172/enplot_REACTOME_CELL_CYCLE_CHECKPOINTS_165.png")

#miRNA #First Function: Cell Cycle

plot_data_scatter(MOFAobject, 
  view = "miRNA",
  factor = 1,  
  features = c("hsa-mir-24-1","hsa-mir-19b-1","hsa-mir-9-1","hsa-mir-221","hsa-mir-20a","hsa-mir-17","hsa-mir-92a-1","hsa-mir-19b-2","hsa-mir-31"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="miRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = 'hsa-mir-24-1',
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 1,
 manual = c("hsa-mir-24-1","hsa-mir-19b-1","hsa-mir-9-1","hsa-mir-221","hsa-mir-20a","hsa-mir-17","hsa-mir-92a-1","hsa-mir-19b-2","hsa-mir-31") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#3. Cell migration #mRNA #First Biological Process: GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_POSITIVE_REGULATION_OF_MONONUCLEAR_CELL_MIGRATION_113.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('CXCL10', 'LGMN', "CALR" , 'CCL5', 'CD47', 'APP','S100A7'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "CXCL10",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('CXCL10', 'LGMN', "CALR" , 'CCL5', 'CD47', 'APP','S100A7') ,
 scale = F           # Scale weights from -1 to 1
)

#Protein #First Biological Process

#miRNA #First Function:

  1. Response to microbial infection. #mRNA #First Pathway: REACTOME_INTERFERON_GAMMA_SIGNALING
knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_INFLUENZA_INFECTION_265.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('KPNA4', 'RPS7', "KPNA2" , 'CALR', 'RPS27A', 'IPO5','RPS19', "RAN" , "RPS12"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "KPNA4",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('KPNA4', 'RPS7', "KPNA2" , 'CALR', 'RPS27A', 'IPO5','RPS19', "RAN" , "RPS12") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Second Pathway: GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_ANTIMICROBIAL_HUMORAL_RESPONSE_83.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('PI3','S100A9', 'KRT6A', 'KLK7','SLPI', 'GAPDH_Genes', 'KLK5', 'CXCL10'),
  sign = "positive",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('PI3','S100A9', 'KRT6A', 'KLK7','SLPI', 'GAPDH_Genes', 'KLK5', 'CXCL10') ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Protein #First Biological Process: GOBP_RESPONSE_TO_BACTERIUM

knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_RESPONSE_TO_BACTERIUM_3.png")

plot_data_scatter(MOFAobject, 
  view = "Protein",
  factor = 1,  
  features = c('LYN', 'IDO1_Protein', "SIRPA" , 'CASP7', 'SOD2_Protein', 'SYK','CD274' , 'PRKCA' , "CD4"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="Protein expression")

plot_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 1,
 manual = c('LYN', 'IDO1_Protein', "SIRPA" , 'CASP7', 'SOD2_Protein', 'SYK','CD274' , 'PRKCA' , "CD4") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#miRNA #First Function: Response to Hypoxia

plot_data_scatter(MOFAobject, 
  view = "miRNA",
  factor = 1,  
  features = c("hsa-mir-137","hsa-mir-210","hsa-mir-138-1","hsa-mir-155","hsa-mir-138-2","hsa-mir-18a","hsa-mir-196b"),
  sign = "positive",
  color_by = "IMS"
) + labs(y="miRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = 'hsa-mir-138-2',
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 1,
 manual = c("hsa-mir-137","hsa-mir-210","hsa-mir-138-1","hsa-mir-155","hsa-mir-138-2","hsa-mir-18a","hsa-mir-196b") ,
 scale = F           # Scale weights from -1 to 1
)

#b. Features Negative to factor one #1. Lipid metabolism #mRNA #First Pathway: REACTOME_SPHINGOLIPID_METABOLISM

knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_SPHINGOLIPID_METABOLISM_283.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('DEGS2', 'UGCG', "CERS4" , 'ASAH1', 'ARSD', 'CERS6','SPTLC2'),
  sign = "negative",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "DEGS2",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('DEGS2', 'UGCG', "CERS4" , 'ASAH1', 'ARSD', 'CERS6','SPTLC2') ,
 scale = F           # Scale weights from -1 to 1
)

#Second Pathway: REACTOME_METABOLISM_OF_LIPIDS

knitr::include_graphics("Enrichment_Results/GSEA_Reactome.GseaPreranked.1679677635779/enplot_REACTOME_METABOLISM_OF_LIPIDS_311.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('SLC44A4', 'DEGS2', "INPP4B_Genes" , 'UGCG', 'STARD10', 'CYP4B1','HSD17B4', "HACD3" , "ELOVL5"),
  sign = "negative",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "SLC44A4",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('SLC44A4', 'DEGS2', "INPP4B_Genes" , 'UGCG', 'STARD10', 'CYP4B1','HSD17B4', "HACD3" , "ELOVL5") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#First Biologoical Process: GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_MEMBRANE_LIPID_BIOSYNTHETIC_PROCESS_121.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('DEGS2', 'UGCG', "HACD3" , 'ELOVL5', 'SCCPDH', 'CERS4','ASAH1'),
  sign = "negative",
  color_by = "IMS"
) + labs(y="mRNA expression")

#Protein #First Biological Process: GOBP_LIPID_METABOLIC_PROCESS

#miRNA #First Function: Lipid Metabolism

plot_data_scatter(MOFAobject, 
  view = "miRNA",
  factor = 1,  
  features = c("hsa-mir-125a","hsa-mir-29c","hsa-mir-33b","hsa-mir-103a-2","hsa-mir-103a-1","hsa-mir-10b","hsa-mir-196a-2","hsa-mir-196a-1","hsa-mir-375"),
  sign = "negative",
  color_by = "IMS"
) + labs(y="miRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = 'hsa-mir-29c',
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "miRNA",
 factor = c(1),
 nfeatures = 1,
 manual = c("hsa-mir-125a","hsa-mir-29c","hsa-mir-33b","hsa-mir-103a-2","hsa-mir-103a-1","hsa-mir-10b","hsa-mir-196a-2","hsa-mir-196a-1","hsa-mir-375") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

2. Hormone related pathways #mRNA #First Biological Process: GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT

knitr::include_graphics("Enrichment_Results/GSEA_C5gobp.GseaPreranked.1679677560989/enplot_GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT_123.png")

plot_data_scatter(MOFAobject, 
  view = "Genes",
  factor = 1,  
  features = c('ESR1_Genes', 'ERBB4', "GATA3_Genes" , 'AR_Genes', 'PGR_Genes', 'ZNF703','PRLR' , 'CCND1_Genes' , "TBX3"),
  sign = "negative",
  color_by = "IMS"
) + labs(y="mRNA expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "ESR1_Genes",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Genes",
 factor = c(1),
 nfeatures = 1,
 manual = c('ESR1_Genes', 'ERBB4', "GATA3_Genes" , 'AR_Genes', 'PGR_Genes', 'ZNF703','PRLR' , 'CCND1_Genes' , "TBX3") ,
 scale = F           # Scale weights from -1 to 1
)
## Warning in RColorBrewer::brewer.pal(n = length(manual) + 1, "Dark2"): n too large, allowed maximum for palette Dark2 is 8
## Returning the palette you asked for with that many colors

#Protein #First Biological Process: GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT

knitr::include_graphics("Enrichment_Results/Protein_C5gobp.GseaPreranked.1679677577693/enplot_GOBP_MAMMARY_GLAND_EPITHELIUM_DEVELOPMENT_43.png")

plot_data_scatter(MOFAobject, 
  view = "Protein",
  factor = 1,  
  features = c('ESR1_Protein', 'GATA3_Protein', "AR_Protein" , 'PGR_Protein', 'MAPK1_Protein'),
  sign = "negative",
  color_by = "IMS"
) + labs(y="Protein expression")

plot_factor(MOFAobject, 
            factors = c(1), 
            color_by = "ESR1_Protein",
            add_violin = T,
            group_by = "IMS",
            show_missing = T,
            add_boxplot = T,
            scale = T,
            dot_size = 3,
            dodge = T,
            stroke = 0.1
)

plot_weights(MOFAobject,
 view = "Protein",
 factor = c(1),
 nfeatures = 1,
 manual = c('ESR1_Protein', 'GATA3_Protein', "AR_Protein" , 'PGR_Protein', 'MAPK1_Protein') ,
 scale = F           # Scale weights from -1 to 1
)

sessionInfo()
## R version 4.2.0 (2022-04-22 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] survminer_0.4.9             ggpubr_0.5.0               
##  [3] survival_3.3-1              biomaRt_2.52.0             
##  [5] rstatix_0.7.2               data.table_1.14.6          
##  [7] MOFAdata_1.14.0             MOFA2_1.6.0                
##  [9] openxlsx_4.2.5.2            ggupset_0.3.0              
## [11] ComplexUpset_1.3.3          TCGAbiolinks_2.24.3        
## [13] ggrepel_0.9.2               umap_0.2.9.0               
## [15] ggfortify_0.4.15            ComplexHeatmap_2.12.1      
## [17] forcats_1.0.0               stringr_1.5.0              
## [19] dplyr_1.0.10                purrr_0.3.5                
## [21] tidyr_1.2.1                 tibble_3.1.8               
## [23] ggplot2_3.4.0               tidyverse_1.3.2            
## [25] DESeq2_1.36.0               SummarizedExperiment_1.26.1
## [27] MatrixGenerics_1.8.1        matrixStats_0.63.0         
## [29] GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
## [31] IRanges_2.30.1              S4Vectors_0.34.0           
## [33] vsn_3.64.0                  Biobase_2.56.0             
## [35] BiocGenerics_0.42.0         readr_2.1.3                
## [37] readxl_1.4.1               
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                  reticulate_1.26            
##   [3] tidyselect_1.2.0            RSQLite_2.2.20             
##   [5] AnnotationDbi_1.58.0        BiocParallel_1.30.4        
##   [7] Rtsne_0.16                  munsell_0.5.0              
##   [9] codetools_0.2-18            preprocessCore_1.58.0      
##  [11] withr_2.5.0                 colorspace_2.0-3           
##  [13] filelock_1.0.2              highr_0.10                 
##  [15] knitr_1.42                  rstudioapi_0.14            
##  [17] ggsignif_0.6.4              labeling_0.4.2             
##  [19] GenomeInfoDbData_1.2.8      KMsurv_0.1-5               
##  [21] mnormt_2.1.1                bit64_4.0.5                
##  [23] farver_2.1.1                pheatmap_1.0.12            
##  [25] rhdf5_2.40.0                downloader_0.4             
##  [27] basilisk_1.8.1              vctrs_0.5.1                
##  [29] generics_0.1.3              xfun_0.37                  
##  [31] timechange_0.2.0            BiocFileCache_2.4.0        
##  [33] markdown_1.5                R6_2.5.1                   
##  [35] doParallel_1.0.17           clue_0.3-63                
##  [37] locfit_1.5-9.7              bitops_1.0-7               
##  [39] rhdf5filters_1.8.0          cachem_1.0.6               
##  [41] reshape_0.8.9               DelayedArray_0.22.0        
##  [43] assertthat_0.2.1            vroom_1.6.1                
##  [45] scales_1.2.1                googlesheets4_1.0.1        
##  [47] gtable_0.3.1                affy_1.74.0                
##  [49] rlang_1.0.6                 genefilter_1.78.0          
##  [51] GlobalOptions_0.1.2         splines_4.2.0              
##  [53] gargle_1.3.0                broom_1.0.3                
##  [55] abind_1.4-5                 BiocManager_1.30.19        
##  [57] yaml_2.3.6                  reshape2_1.4.4             
##  [59] modelr_0.1.10               backports_1.4.1            
##  [61] gridtext_0.1.5              tools_4.2.0                
##  [63] psych_2.2.9                 affyio_1.66.0              
##  [65] ellipsis_0.3.2              jquerylib_0.1.4            
##  [67] RColorBrewer_1.1-3          Rcpp_1.0.9                 
##  [69] plyr_1.8.8                  progress_1.2.2             
##  [71] zlibbioc_1.42.0             RCurl_1.98-1.9             
##  [73] basilisk.utils_1.8.0        prettyunits_1.1.1          
##  [75] openssl_2.0.5               GetoptLong_1.0.5           
##  [77] cowplot_1.1.1               zoo_1.8-11                 
##  [79] haven_2.5.1                 cluster_2.1.3              
##  [81] fs_1.6.0                    magrittr_2.0.3             
##  [83] RSpectra_0.16-1             circlize_0.4.15            
##  [85] reprex_2.0.2                googledrive_2.0.0          
##  [87] hms_1.1.2                   patchwork_1.1.2            
##  [89] TCGAbiolinksGUI.data_1.16.0 evaluate_0.20              
##  [91] xtable_1.8-4                XML_3.99-0.13              
##  [93] gridExtra_2.3               shape_1.4.6                
##  [95] compiler_4.2.0              crayon_1.5.2               
##  [97] htmltools_0.5.4             mgcv_1.8-40                
##  [99] tzdb_0.3.0                  ggtext_0.1.2               
## [101] geneplotter_1.74.0          lubridate_1.9.1            
## [103] DBI_1.1.3                   corrplot_0.92              
## [105] dbplyr_2.3.0                rappdirs_0.3.3             
## [107] Matrix_1.5-3                car_3.1-1                  
## [109] cli_3.5.0                   parallel_4.2.0             
## [111] km.ci_0.5-6                 pkgconfig_2.0.3            
## [113] dir.expiry_1.4.0            xml2_1.3.3                 
## [115] foreach_1.5.2               annotate_1.74.0            
## [117] bslib_0.4.2                 XVector_0.36.0             
## [119] rvest_1.0.3                 digest_0.6.30              
## [121] Biostrings_2.64.1           rmarkdown_2.20             
## [123] cellranger_1.1.0            survMisc_0.5.6             
## [125] uwot_0.1.14                 curl_5.0.0                 
## [127] commonmark_1.8.1            rjson_0.2.21               
## [129] nlme_3.1-157                lifecycle_1.0.3            
## [131] jsonlite_1.8.3              Rhdf5lib_1.18.2            
## [133] carData_3.0-5               askpass_1.1                
## [135] limma_3.52.4                fansi_1.0.3                
## [137] pillar_1.8.1                lattice_0.20-45            
## [139] GGally_2.1.2                KEGGREST_1.36.3            
## [141] fastmap_1.1.0               httr_1.4.4                 
## [143] glue_1.6.2                  zip_2.2.2                  
## [145] png_0.1-8                   iterators_1.0.14           
## [147] bit_4.0.5                   stringi_1.7.8              
## [149] sass_0.4.5                  HDF5Array_1.24.2           
## [151] blob_1.2.3                  memoise_2.0.1